# Medicare Fraud Detection using Unsupervised Machine Learning

This repository contains code and resources for the research paper titled "Can Machine Learning Target Health Care Fraud? Evidence from Medicare Hospitalizations".

## Citation

If you use this code or find this research useful, please cite:

```bibtex
@article{shekhar2024machine,
  title={Can Machine Learning Target Health Care Fraud? Evidence from Medicare Hospitalizations},
  author={Shekhar, Shubhranshu and Leder-Luis, Jetson and Akoglu, Leman},
  journal={Journal of Policy Analysis and Management},
  year={2025}
}
```

**Working Paper Version:** [Unsupervised Machine Learning for Explainable Health Care Fraud Detection](https://www.nber.org/system/files/working_papers/w30946/w30946.pdf) (NBER Working Paper No. 30946)


## Repository Structure

```
├── src/                    # Source code
│   ├── data_util.py        # Data loading and preprocessing utilities
│   ├── prepeocessing_beneficiary_data.py     # Beneficiary profile
│   ├── preprocessing_icd.py    # Processing provider and ICD distribution
|   ├── module_peer_detector.py     # Peer based expense anomaly
|   ├── module_regression.py      # Fixed effects regression
|   ├── module_subspace_detector.py     # Subspace outlier detetctors  
|   └── post_preocessing.py     # Post processing of results
├── data/                  # DoJ press releases data file
└── README.md             # This file
```

## Getting Started

### Prerequisites

```bash
# Required Python packages
pandas
numpy
scikit-learn
[other key dependencies]
```

### Usage
Run each file in the order as mentioned above in Repository Structure.
Each preprocessing step creates intermediate dataset that are subsequently
used in the multi-view detectors.


## Data Sources

This research utilizes several public datasets:

### Medicare Claims Data
- **Source:** [ResDAC](https://resdac.org/research-identifiable-files-rif-requests )
- **Description:** Available but access restricted. Requires signing DUA.

### Hospital General Information
- **Source:** [CMS](https://data.cms.gov/provider-data/dataset/xubh-q36u)
- **Description:** Hospital characteristics and quality metrics
- **Files:** `Hospital_General_Information.csv`

### Department of Justice Press Releases
- **Source:** [DOJ Press Releases](https://www.justice.gov/news)
- **Files:** `doj_oausa_combined_with_id.json` in data directory

### Additional Datasets
- **Provider of Services File** (https://data.cms.gov/provider-characteristics/hospitals-and-other-facilities/provider-of-services-file-hospital-non-hospital-facilities)
- **CMS Medicare Inpatient Hospitals - by Provider and Service:** (https://data.cms.gov/provider-summary-by-type-of-service/medicare-inpatient-hospitals/medicare-inpatient-hospitals-by-provider-and-service)
